Methods, apparatuses, and devices for generating word vectors

ABSTRACT

Implementations of the present specification disclose a method for generating word vectors, apparatus, and device. The method includes: obtaining words by segmenting a corpus; establishing a feature vector of each obtained word based on n-ary characters; training a convolutional neural network based on the feature vectors of the obtained words and the feature vectors of context words associated with each obtained word in the corpus; and generating a word vector for each obtained word based on the feature vector of the obtained word and the trained convolutional neural network.

CROSS-REFERENCE TO RELATED APPLICATIONS

This patent application is a continuation of PCT Application No.PCT/CN2019/073005, filed Jan. 24, 2019, which claims priority to ChinesePatent Application No. 201810111369.8, filed Feb. 5, 2018, and entitled“METHODS, APPARATUSES, AND DEVICES FOR GENERATING WORD VECTORS,” whichare incorporated herein by reference in their entirety.

TECHNICAL FIELD

The present specification relates to the technical field of computersoftware, and in particular, to methods, apparatuses, and devices forgenerating word vectors.

BACKGROUND

Most of today's natural language processing solutions employ a neuralnetwork-based architecture in which an important underlying technologyis word vector. A word vector is a vector that maps a word to a fixednumber of dimensions, where the vector indicates the semanticinformation of the word.

In existing technologies, commonly used algorithms for generating wordvectors include, for example, Google's word vector algorithm,Microsoft's deep neural network algorithm, etc.

In view of the existing technologies, there is a need for more accuratesolutions for generating word vectors.

BRIEF SUMMARY

Implementations of the present specification provide methods,apparatuses, and devices for generating word vectors, to provide a morereliable solution for generating word vectors.

The implementations of the present specification provide the followingsolutions:

An implementation of the present specification provides a method forgenerating word vectors, including: obtain words by segmenting a corpus;establish a feature vector for each obtained word based on n-arycharacters corresponding to the obtained word, where each n-arycharacter represents n consecutive characters of a word corresponding tothe n-ary character; train a convolutional neural network based on thefeature vectors of the obtained words and the feature vectors of contextwords associated with each obtained word in the corpus; and generate aword vector for each obtained word based on the feature vector of theobtained word and the trained convolutional neural network.

An implementation of the present specification provides an apparatus forgenerating word vectors, including: an acquisition module, configured toobtain words by segmenting a corpus; an establishment module, configuredto establish a feature vector for each obtained word based on n-arycharacters corresponding to the obtained word, where each n-arycharacter represents n consecutive characters of a word corresponding tothe n-ary character; a training module, configured to train aconvolutional neural network based on the feature vectors of theobtained words and the feature vectors of context words associated witheach obtained word in the corpus; and a generation module, configured togenerate a word vector for each obtained word based on the featurevector of the obtained word and the trained convolutional neuralnetwork.

An implementation of the present specification provides another methodfor generating word vectors, including:

Step 1: Establishing a vocabulary of words obtained by segmenting acorpus, where the obtained words exclude a word that appears for lessthan a predetermined or dynamically determined number of times in thecorpus.

Step 2: Determining a total number of n-ary characters corresponding toall of the obtained words, where same n-ary characters are counted once,and each n-ary character represents n consecutive characters of anobtained word corresponding to the n-ary character.

Step 3: Establishing, for each obtained word, based on each n-arycharacter corresponding to the obtained word, a feature vector whosedimensionality is the total number, where each dimension of the featurevector corresponds to a different n-ary character, and the value of eachdimension indicates whether a corresponding n-ary character is mapped tothe obtained word corresponding to the feature vector.

Step 4: Traversing the corpus on which word segmentation is performed,and performing step 5 with respect to the current word accessed duringthe traversal, and if the traversal is completed, performing step 6;otherwise continuing the traversal.

Step 5: Using the current word as a center, sliding towards both sidesfor at most k words to establish a window, and using the words in thewindow except the current word as context words, inputting featurevectors of the context words associated with the current word into aconvolutional layer of the convolutional neural network forconvolutional calculation, and inputting a result of the convolutionalcalculation into a pooling layer of the convolutional neural network forpooling calculation to obtain a first vector; inputting a feature vectorof the current word and a feature vector of a negative sample wordselected in the corpus into the full connection layer of theconvolutional neural network for calculation to obtain a second vectorand a third vector; and updating parameters of the convolutional neuralnetwork based on the first vector, the second vector, the third vector,and a specified loss function, where

-   -   the convolutional calculation is performed according to the        following formula:        {tilde over (x)} _(i) =x _(i:i+θ-1)=[x _(i) ^(T) ,x _(i+1) ^(T)        , . . . ,x _(i+θ-1) ^(T)]^(T)        y _(i)=σ(ω{tilde over (x)} _(i)+ζ)        the pooling calculation is performed according to the following        formula:

${{c(j)} = {\max\limits_{{i = 1},2,\;\ldots\;,{t - \theta + 1}}\{ {y_{i}(j)} \}}},{{{or}\mspace{14mu}{c(j)}} = {\underset{{i = 1},2,\;\ldots\;,{t - \theta + 1}}{average}\{ {y_{i}(j)} \}}}$and the loss function includes:

${l( {w,{c;\omega},\ \zeta,\ \varsigma,\tau} )} = {\log\;( {1 + {\sum\limits_{m = 1}^{\lambda}{\exp\;( {{- \gamma} \cdot ( {{s( {w,c} )} - {s( {w_{m}^{\prime},c} )}} )} )}}} )}$where x_(i) indicates a feature vector of the i^(th) context word;x_(i:i+θ-1) indicates a vector that is obtained by concatenating thefeature vectors of the i^(th) to the i+θ−1^(th) context words; y_(i)indicates the i^(th) element in the vector that is obtained through theconvolutional calculation; ω indicates weight parameters of theconvolutional layer; ζ indicates offset parameters of the convolutionallayer; σ indicates an activation function; max indicates a maximum valuefunction; average indicates an average value function; c(j) indicatesthe j^(th) element in the first vector that is obtained through poolingcalculation; t indicates the number of context words; c indicates thefirst vector; w indicates the second vector; w_(m)′ indicates the thirdvector corresponding to the m^(th) negative sample word; co indicatesweight parameters of the convolutional layer; ζ indicates offsetparameters of the convolutional layer; ç indicates weight parameters ofthe full connection layer; τ indicates offset parameters of the fullconnection layer; γ indicates a hyperparameter; s indicates a similaritycalculation function; and λ indicates the number of negative samplewords.

Step 6: Inputting the feature vector of each obtained word into the fullconnection layer of the trained convolutional neural network forcalculation to obtain corresponding word vectors.

An implementation of the present specification provides a device forgenerating word vectors, including: at least one processor, and a memorycommunicatively connected to the at least one processor, where thememory stores instructions that can be executed by the at least oneprocessor, and the instructions are executed by the at least oneprocessor to enable the at least one processor to: obtain words bysegmenting a corpus; establish a feature vector for each obtained wordbased on n-ary characters corresponding to the obtained word, where eachn-ary character represents n consecutive characters of a wordcorresponding to the n-ary character; train a convolutional neuralnetwork based on the feature vectors of the obtained words and thefeature vectors of context words associated with each obtained word inthe corpus; and generate a word vector for each obtained word based onthe feature vector of the obtained word and the trained convolutionalneural network.

The previously described at least one technical solution adopted in theimplementations of the present specification can achieve the followingbeneficial effects: The convolutional neural network can depict thecontext and overall semantic information of the word through theconvolutional calculation and the pooling calculation and extract morecontext and semantic information, and the n-ary character can expressthe word more finely, so that the word vector can be more accuratelygenerated.

BRIEF DESCRIPTION OF DRAWINGS

To describe the technical solutions in the implementations of thepresent specification or in the existing technologies more clearly, thefollowing is a brief introduction of the accompanying drawings forillustrating such technical solutions. Clearly, the accompanyingdrawings described below are merely some implementations of the presentspecification, and a person skilled in the art can derive other drawingsfrom such accompanying drawings without making innovative efforts.

FIG. 1 is a schematic diagram illustrating an overall architectureinvolved in an actual application scenario of solutions of the presentspecification;

FIG. 2 is a schematic flowchart illustrating a method for generatingword vectors according to some implementations of the presentspecification;

FIG. 3 is a schematic diagram illustrating a feature vector of anEnglish word in an application scenario according to someimplementations of the present specification;

FIG. 4 is a schematic diagram illustrating a convolutional neuralnetwork in an actual application scenario according to someimplementations of the present specification;

FIG. 5 is a schematic flowchart illustrating another method forgenerating word vectors according to some implementations of the presentspecification; and

FIG. 6 is a schematic structural diagram illustrating an apparatus forgenerating word vectors that corresponds to FIG. 2 according to someimplementations of the present specification.

DETAILED DESCRIPTION

Implementations of the present specification provide methods,apparatuses, and devices for generating word vectors.

To enable a person skilled in the art to better understand technicalsolutions in the present specification, the following clearly andcompletely describes the technical solutions in the implementations ofthe present specification with reference to the drawings that accompanythe implementations. Clearly, the described implementations are merelysome rather than all of the implementations of the present application.Based on the implementations of the present specification, all otherimplementations obtained by a person of ordinary skill in the artwithout making innovative efforts shall fall within the protection scopeof the present application.

FIG. 1 is a schematic diagram of an overall architecture involved in anactual application scenario of solutions of the present specification.The overall architecture relates primarily to a server that is used totrain a convolutional neural network to generate word vectors. A featurevector of each obtained word can be established based on an n-arycharacter, and a convolutional neural network can be trained based onthe feature vectors and the context of the words. The feature vectorscan be established by the server or another device.

The solutions of the present specification are applicable to languagesthat are formed by alphabetic letters, such as English, French, German,and Spanish, and are also applicable to languages that are formed bynon-alphabetic elements but can be easily mapped to alphabetic letters,such as Chinese (which can be mapped to pinyin letters) and Japanese(which can be mapped to Roman letters). For ease of description, in thefollowing implementations, the solutions of the present specificationare mainly described with respect to scenarios of English.

FIG. 2 is a schematic flowchart illustrating a method for generatingword vectors according to some implementations of the presentspecification. From the perspective of a device, an actor that executesthe process includes, for example, at least one of a personal computer,a large- or medium-sized computer, a computer cluster, a mobile phone, atablet computer, a smart wearable device, a vehicle, etc.

The process in FIG. 2 includes the following steps:

S202: Obtain individual words by segmenting a corpus.

In some implementations of the present specification, more specifically,the obtained words can be at least some of the words that appear atleast once in the corpus. To facilitate subsequent processing, each wordcan be stored in a vocabulary, and individual words can be read from thevocabulary for use.

It is worthwhile to note that, considering that if a word appears toofew times in the corpus, the number of corresponding iterations in thesubsequent processing is also small, and the training result reliabilityis relatively low, such a word can be screened out. In this case, morespecifically, the obtained words are some words out of all the wordsthat appear at least once or a specified number of times in the corpus.The specified number can be manually defined, or automaticallydetermined based on a frequency distribution of words that appear in thecorpus.

S204: Establish a feature vector for each obtained word based on n-arycharacters corresponding to the obtained word, where each n-arycharacter represents n consecutive characters of a word corresponding tothe n-ary character.

In this implementation of the present specification, characters of theword can include the characters constituting the word, or othercharacters to which the characters constituting the word are mapped. Forexample, the word “boy” includes characters “b”, “o”, and “y”.

To express the order of words, some mark characters can be added to theoriginal words based on certain rules, and these mark characters canalso be considered characters of the words. For example, a markcharacter can be added to a position such as a start position and/or anend position of the original word, and after such marking, the word“boy” can take the form of “# boy #”, and the two “#” can be consideredcharacters of the word “boy”.

Further, n is an integer not less than 1. Using “# boy #” as an example,the word includes the following 3-ary characters: “# bo” (the 1^(st) to3^(rd) characters), “boy” (the 2^(nd) to 4^(th) characters), and “oy #”(the 3^(rd) to 5^(th) characters); and includes the following 4-arycharacters: “# boy” (the 1^(st) to 4^(th) characters) and “boy #” (the2^(nd) to 5^(th) characters).

In some implementations of the present specification, the value of n canbe dynamically adjusted. For the same word, at the time of determiningthe n-ary characters corresponding to the word, “n” can have a singlevalue, e.g., n=3 means that only 3-ary characters corresponding to theword are determined, or have multiple values, e.g., n=3, 4 means thatboth 3-ary characters and 4-ary characters corresponding to the word aredetermined.

To facilitate computer processing, n-ary characters can be representedbased on specified codes (e.g., numbers). For example, differentcharacters or different n-ary characters can be represented by differentcodes or code strings.

In some implementations of the present specification, the feature vectorof a word can have values assigned to different dimensions to indicatethe n-ary characters corresponding to the word. More precisely, thefeature vector of the word can also be used to indicate the order of then-ary characters corresponding to the word.

S206: Train a convolutional neural network based on the feature vectorsof the obtained words and of their context words in the corpus.

In some implementations of the present description, the convolutionlayer of the convolution neural network is used for extracting localinformation from neurons, and the pooling layer of the convolutionneural network is used for synthesizing all local information of theconvolution layer to obtain global information. Specifically, inscenarios of the present specification, local information can refer tooverall semantics of some context words associated with the currentword, e.g., each obtained word can be used as the current word, andglobal information can refer to overall semantics of all context wordsassociated with the current word.

S208: Generate a word vector for each obtained word based on the featurevector of the obtained word and the trained convolutional neuralnetwork.

By training the convolutional neural network, appropriate parameters canbe determined for the convolutional neural network, so that theconvolutional neural network can more accurately depict the overallsemantics of the context words and the corresponding semantics of thecurrent words. The parameters include, for example, weight parametersand offset parameters.

The word vector can be obtained by making inferences from the featurevector based on the trained convolutional neural network.

By using the method in FIG. 2, the convolutional neural network candepict the overall semantic information of the context of the wordthrough convolutional calculation and pooling calculation and extractmore semantic information of the context, and the n-ary characters canexpress the word more finely, so that the word vector can be moreaccurately generated.

Based on the method shown in FIG. 2, some implementations of the presentspecification further provides some implementation solutions of themethod, as well as extension solutions, which are described below.

In some implementations of the present specification, for step S204, theestablishing of a feature vector for each obtained word based on then-ary characters corresponding to the obtained word can specificallyinclude:

determining a total number of different n-ary characters for all n-arycharacters corresponding to the obtained word; and establishing, foreach word, a feature vector whose dimensionality is determined based onthe total number, where the feature vector can be assigned values todifferent dimensions to indicate the n-ary characters corresponding tothe word.

For example, all of the n-ary characters corresponding to the obtainedword are indexed from 0 at an increment of 1. The same n-ary charactershave the same index number. Assuming that the total number is N_(c),then the index number of the last n-ary character is N−1. A featurevector whose dimension is N_(c) is established for each word.Specifically, assuming that n=3, and that the indexes of all the 3-arycharacters corresponding to a word are 2, 34, and 127, then the 2^(nd),the 34^(th), and the 127^(th) elements in the feature vector establishedfor that word can be 1, and the remaining elements are 0. Using the sameexample, in some embodiments, the values assigned to the 2nd, the 34th,and the 127th elements in the feature vector can be different from oneanother, to reflect different weights or to indicate an order of the3-ary characters corresponding to the word.

More intuitively, based on the above example, this implementation of thepresent specification provides a schematic diagram illustrating afeature vector of an English word in an application scenario, as shownin FIG. 3. The English word is “boy”, and a mark character “#” is addedat the start position and the end position. f indicates the process ofestablishing a feature vector based on the word. The feature vector is,for example, a column vector that is established based on each 3-arycharacter of “boy”. It can be seen that, the values of three elements inthe feature vector are 1s, which respectively indicate the 3-arycharacters “# bo”, “boy”, and “oy #”; and the values of other elementsare 0s, which indicates that “boy” does not correspond to any other3-ary character.

In this implementation of the present specification, when theconvolutional neural network is trained, the goal is to ensure that thesimilarity between the feature vectors of the current word and itscontext words to be relatively high after inferences made based on thetrained convolutional neural network.

Further, context words are regarded as positive sample words, and, as acontrast, one or more negative sample words associated with the currentword can be selected based on certain rules to be involved in thetraining, so as to ensure fast training convergence and obtain moreaccurate training results. In this case, the goal can further includeensuring that the similarity between the feature vectors of the currentword and the negative sample word are relatively low after inferenceshave been made based on the trained convolutional neural network.Negative sample words can be selected randomly in the corpus, orselected in the non-context words, etc. The present specification doesnot limit the specific ways of calculating such similarity. For example,the similarity can be calculated based on a cosine operation of theangle between vectors, the similarity can be calculated based on thesquare sum operation of the vectors, etc.

According to the above analysis, for step S206, the convolutional neuralnetwork is trained based on the feature vectors of the obtained words aswell as the feature vectors of their context words in the corpus.Specifically, the training can include:

training the convolutional neural network based on the feature vectorsof the obtained words as well as the feature vectors of their contextwords and their negative sample words in the corpus.

In some implementations of the present specification, the trainingprocess of the convolutional neural network can be iterativelyperformed. A relatively simple way is to traverse the corpus after wordsegmentation, and each time one of the obtained words is accessed, aniteration is performed, until the traversal is complete. Theconvolutional neural network can then be considered to have been trainedusing the corpus.

Specifically, the training of the convolutional neural network based onthe feature vectors of the obtained words and the feature vectors oftheir context words and negative sample words in the corpus can include:

traversing the corpus after word segmentation, and executing thefollowing for the current word accessed during the traversal (thecontent of this execution corresponds to a process during oneiteration):

for the current word obtained after word segmentation, determining oneor more context words and negative sample words in the corpus; inputtingfeature vectors of the context words associated with the current wordinto a convolutional layer of the convolutional neural network forconvolutional calculation; inputting the result of the convolutionalcalculation into a pooling layer of the convolutional neural network forpooling calculation to obtain a first vector; inputting a feature vectorof the current word into the full connection layer of the convolutionalneural network for calculation to obtain a second vector, and inputtinga feature vector of a negative sample word associated with the currentword into the full connection layer of the convolutional neural networkfor calculation to obtain a third vector; and updating parameters of theconvolutional neural network based on the first vector, the secondvector, the third vector, and a specified loss function.

More intuitively, the above process is described with reference to FIG.4. FIG. 4 is a schematic diagram illustrating a convolutional neuralnetwork in an actual application scenario according to someimplementations of the present specification.

The convolutional neural network in FIG. 4 mainly includes aconvolutional layer, a pooling layer, a full connection layer, and aSoftmax layer. In the process of training the convolutional neuralnetwork, the feature vectors of the context words are processed by theconvolutional layer and the pooling layer to extract the semanticinformation of the context words as a whole, and the feature vectorassociated with the current word and its negative sample words can beprocessed by the full connection layer. Detailed description is providedbelow.

In this implementation of the present specification, it is assumed thata sliding window is used to determine a context word, the center of thesliding window is the current word being accessed during the traversal,and the words in the sliding window other than the current word arecontext words. The feature vectors of all the context words are inputtedto the convolutional layer, and then convolutional calculation can beperformed according to the following formula:{tilde over (x)} _(i) =x _(i:i+θ-1)=[x _(i) ^(T) ,x _(i+1) ^(T) , . . .x _(i+θ-1) ^(T)]^(T)y _(i)=σ(ω{tilde over (x)} _(i)+ζ)where x_(i) indicates a feature vector (which is assumed to be a columnvector) of the i^(th) context word; x_(i:i+θ-1) indicates a vector thatis obtained by concatenating the feature vectors of the i^(th) to thei+θ−1^(th) context words; y_(i) indicates the i^(th) element in thevector (the result of the convolutional calculation) obtained throughthe convolutional calculation; co indicates the weight parameter(s) ofthe convolutional layer; indicates the offset parameter(s) of theconvolutional layer; a indicates an activation function, for example, ifthe Sigmoid function is used,

${\sigma = \frac{1}{1 + e^{- x}}}.$

Further, after the result of the convolutional calculation is obtained,the result can be input into the pooling layer for pooling calculation.Specifically, maximum pooling calculation or average pooling calculationcan be used.

For maximum pooling calculation, as an example, the following formula isused:

${c(j)} = {\max\limits_{{i = 1},2,\;\ldots\;,{t - \theta + 1}}\{ {y_{i}(j)} \}}$For average pooling calculation, as an example, the following formula isused:

${c(j)} = {\underset{{i = 1},2,\;\ldots\;,{t - \theta + 1}}{average}\{ {y_{i}(j)} \}}$where max indicates a maximum value function; average indicates anaverage value function; c(j) indicates the j^(th) element in the firstvector obtained through the pooling calculation; and t indicates thenumber of context words.

FIG. 4 also exemplarily illustrates the current word “liquid” in acorpus, 6 context words “as”, “the”, “vegan”, “gelatin”, “substitute”,and “absorbs” associated with the current word in the corpus, and 2negative sample words “year” and “make” associated with the current wordin the corpus. In FIG. 4, it is assumed that all the feature vectorsestablished based on the n-ary characters have N_(c) dimensions, and θ=3indicates the length of the convolution window. Then, the vectorsobtained through concatenation during the convolutional calculation haveθ·N_(c)=3·N_(c) dimensions.

The feature vector of the current word can be input to the fullconnection layer, and can be calculated according to, for example, thefollowing formula:w=σ(ç·q+τ)where w indicates the second vector that is output by the fullconnection layer after processing the feature vector of the currentword; ç indicates the weight parameter(s) of the full connection layer;q indicates the feature vector of the current word; and τ indicates theoffset parameter(s) of the full connection layer.

Similarly, for each negative sample word, a respective feature vectorcan be input to the full connection layer and processed in a way as thecurrent word is processed, to obtain a third vector, and the thirdvector corresponding to the m^(th) negative sample word is denoted asw_(m)′. In other words, multiple third vectors can be obtained and eachthird vector is generated based on a different negative sample wordassociated with the current word.

Further, the updating the parameters of the convolutional neural networkbased on the first vector, the second vector, the third vector, and thespecified loss function can include, for example, calculating a firstsimilarity between the second vector and the first vector, and a secondsimilarity between the third vector and the first vector; and updatingthe parameters of the convolutional neural network based on the firstsimilarity, the second similarity, and the specified loss function.

A loss function is used as an example. The loss function can be, forexample:

${l( {w,{c;\omega},\ \zeta,\ \varsigma,\tau} )} = {\log\;( {1 + {\sum\limits_{m = 1}^{\lambda}{\exp\;( {{- \gamma} \cdot ( {{s( {w,c} )} - {s( {w_{m}^{\prime},c} )}} )} )}}} )}$where c indicates the first vector; w indicates the second vector;w_(m)′ indicates the third vector corresponding to the m^(th) negativesample word; ω indicates weight parameter(s) of the convolutional layer;ζ indicates offset parameter(s) of the convolutional layer; ç indicatesweight parameter(s) of the full connection layer; τ indicates offsetparameter(s) of the full connection layer; γ indicates a hyperparameter;s indicates a similarity calculation function; and λ indicates thenumber of negative sample words.

In practice, if no negative sample word is used, the term forcalculating the similarity between the first vector and the third vectorcan be removed from the loss function used.

In some implementations of the present specification, after training ofthe convolutional neural network, a word vector can be generated bymaking inferences from the feature vector. Specifically, for step S208,the generating of a word vector for each obtained word based on thefeature vector of each obtained word and the trained convolutionalneural network can specifically include:

inputting the feature vector of each obtained word into the fullconnection layer of the trained convolutional neural network forcalculation to obtain a vector output, which is considered acorresponding word vector.

Based on the same idea, some implementations of the presentspecification provides another method for generating word vectors, whichis an example implementation of the method for generating word vectorsin FIG. 2. FIG. 5 is a schematic flowchart illustrating such a methodfor generating word vectors.

The process in FIG. 5 includes following steps:

Step 1: Establishing a vocabulary of words obtained by segmenting acorpus, where the obtained words exclude a word that appears for lessthan a predetermined or dynamically determined number of times in thecorpus; going to step 2.

Step 2: Determining a total number of n-ary characters corresponding toall of the obtained words, where same n-ary characters are counted once,and each n-ary character represents n consecutive characters of anobtained word corresponding to the n-ary character; going to step 3.

Step 3: Establishing, for each obtained word, based on each n-arycharacter corresponding to the obtained word, a feature vector whosedimensionality is the total number, where each dimension of the featurevector corresponds to a different n-ary character, and the value of eachdimension indicates whether a corresponding n-ary character is mapped tothe obtained word corresponding to the feature vector; going to step 4.

Step 4: Traversing the corpus on which word segmentation is performed,and performing step 5 with respect to the current word accessed duringthe traversal, and if the traversal is completed, performing step 6;otherwise continuing the traversal.

Step 5: Using the current word as a center, sliding towards both sidesfor at most k words to establish a window, and using the words in thewindow except the current word as context words, inputting featurevectors of the context words associated with the current word into aconvolutional layer of the convolutional neural network forconvolutional calculation, and inputting a result of the convolutionalcalculation into a pooling layer of the convolutional neural network forpooling calculation to obtain a first vector; inputting a feature vectorof the current word and a feature vector of a negative sample wordselected in the corpus into the full connection layer of theconvolutional neural network for calculation to obtain a second vectorand a third vector; and updating parameters of the convolutional neuralnetwork based on the first vector, the second vector, the third vector,and a specified loss function, where the convolutional calculation isperformed according to the following formula:{tilde over (x)} _(i) =x _(i:i+θ-1)=[x _(i) ^(T) ,x _(i+1) ^(T) , . . .,x _(i+θ-1) ^(T)]^(T)y _(i)=σ(ω{tilde over (x)} _(i)+ζ)the pooling calculation is performed according to the following formula:

${{c(j)} = {\max\limits_{{i = 1},2,\;\ldots\;,{t - \theta + 1}}\{ {y_{i}(j)} \}}},{{{or}\mspace{14mu}{c(j)}} = {\underset{{i = 1},2,\;\ldots\;,{t - \theta + 1}}{average}\{ {y_{i}(j)} \}}}$and the loss function includes:

${l( {w,{c;\omega},\ \zeta,\ \varsigma,\tau} )} = {\log\;( {1 + {\sum\limits_{m = 1}^{\lambda}{\exp\;( {{- \gamma} \cdot ( {{s( {w,c} )} - {s( {w_{m}^{\prime},c} )}} )} )}}} )}$where x_(i) indicates a feature vector of the i^(th) context word;x_(i:i+θ-1) indicates a vector that is obtained by concatenating thefeature vectors of the i^(th) to the i+θ−1^(th) context words; y_(i)indicates the i^(th) element in the vector that is obtained through theconvolutional calculation; ω indicates weight parameter(s) of theconvolutional layer; ζ indicates offset parameter(s) of theconvolutional layer; σ indicates an activation function; max indicates amaximum value function; average indicates an average value function;c(j) indicates the j^(th) element in the first vector that is obtainedthrough pooling calculation; t indicates the number of context words; cindicates the first vector; w indicates the second vector; indicates thethird vector corresponding to the m^(th) negative sample word; ωindicates weight parameter(s) of the convolutional layer; ζ indicatesoffset parameter(s) of the convolutional layer; ç indicates weightparameter(s) of the full connection layer; τ indicates offsetparameter(s) of the full connection layer; γ indicates a hyperparameter;s indicates a similarity calculation function; and λ indicates thenumber of negative sample words.

Step 6: Inputting the feature vector of each obtained word into the fullconnection layer of the trained convolutional neural network forcalculation to obtain corresponding word vectors.

The steps in this method for generating word vectors can be performed bythe same or different modules, which are not specifically limited in thepresent specification.

The foregoing has described the method for generating word vectorsaccording to some implementations of the present specification. Based onthe same idea, some implementations of the present specification furtherprovides a corresponding apparatus, as shown in FIG. 6.

FIG. 6 is a schematic structural diagram illustrating an apparatus forgenerating word vectors that corresponds to FIG. 2 according to someimplementations of the present specification. The apparatus can belocated in the entity to execute the process in FIG. 2 and includes: anacquisition module 601, configured to obtain individual words bysegmenting a corpus; an establishment module 602, configured toestablish a feature vector for each obtained word based on n-arycharacters corresponding to the obtained word, where each n-arycharacter represents n consecutive characters of the word correspondingto the n-ary character; a training module 603, configured to train aconvolutional neural network based on the feature vectors of theobtained words and the feature vectors of context words associated witheach obtained word in the corpus; and a generation module 604,configured to generate a word vector for each obtained word based on thefeature vector of each obtained word and the trained convolutionalneural network.

Optionally, characters of a word include each character constituting theword, and mark characters added to the start position and/or the endposition of the word.

Optionally, the establishment module 602's establishing of a featurevector of each obtained word based on each n-ary character correspondingto each word specifically includes: determining, by the establishmentmodule 602, a total number of distinct n-ary characters in a collectionof respective n-ary characters corresponding to each of the obtainedwords; and establishing, for each word, a feature vector whosedimensionality is determined based on the total number, where thefeature vector can be assigned values to different dimensions toindicate n-ary characters corresponding to the obtained word.

Optionally, the training module 603's training of the convolutionalneural network based on the feature vectors of the obtained words andthe feature vectors of the context words and the negative sample wordsassociated with each obtained word in the corpus specifically includes:

training, by the training module 603, the convolutional neural networkbased on the feature vectors of the obtained words and the featurevectors of the context words and the negative sample words associatedwith the words in the corpus.

Optionally, the training module 603's training of the convolutionalneural network based on the feature vectors of the obtained words andthe feature vectors of the context words and the negative sample wordsassociated with the words in the corpus specifically includes:

traversing, by the training module 603, the corpus after wordsegmentation, and executing the following when the current word isaccessed during the traversal: determining one or more context words andnegative sample words associated with the current word in the corpusafter word segmentation; inputting feature vectors of the context wordsassociated with the current word into a convolutional layer of theconvolutional neural network for convolutional calculation; inputtingthe result of the convolutional calculation into a pooling layer of theconvolutional neural network for pooling calculation to obtain a firstvector; inputting a feature vector of the current word into the fullconnection layer of the convolutional neural network for calculation toobtain a second vector, and inputting a feature vector of a negativesample word associated with the current word into the full connectionlayer of the convolutional neural network for calculation to obtain athird vector; and updating parameters of the convolutional neuralnetwork based on the first vector, the second vector, the third vector,and a specified loss function.

Optionally, the training module 603's performing of a convolutionalcalculation specifically includes:

performing, by the training module 603, the convolutional calculationaccording to the following formula:{tilde over (x)} _(i) =x _(i:i+θ-1)=[x _(i) ^(T) ,x _(i+1) ^(T) , . . .,x _(i+θ-1) ^(T)]^(T)y _(i)=σ(ω{tilde over (x)} _(i)+ζ)where x_(i) indicates a feature vector of the i^(th) context word;x_(i:i+θ-1) indicates a vector that is obtained by concatenating thefeature vectors of the i^(th) to the i+θ−1^(th) context words; y_(i)indicates the i^(th) element in the vector that is obtained through theconvolutional calculation; ω indicates weight parameter(s) of theconvolutional layer; indicates offset parameter(s) of the convolutionallayer; a indicates an activation function.

Optionally, the training module 603's performing of a poolingcalculation specifically includes:

performing, by the training module 603, maximum pooling calculation oraverage pooling calculation.

Optionally, the training module 603's updating of the parameters of theconvolutional neural network based on the first vector, the secondvector, the third vector, and the specified loss function specificallyincludes: calculating, by the training unit 603, a first similaritybetween the second vector and the first vector, and a second similaritybetween the third vector and the first vector; and updating theparameters of the convolutional neural network based on the firstsimilarity, the second similarity, and the specified loss function.

Optionally, the loss function specifically includes:

${l( {w,{c;\omega},\ \zeta,\ \varsigma,\tau} )} = {\log\;( {1 + {\sum\limits_{m = 1}^{\lambda}{\exp\;( {{- \gamma} \cdot ( {{s( {w,c} )} - {s( {w_{m}^{\prime},c} )}} )} )}}} )}$where c indicates the first vector; w indicates the second vector;w_(m)′ indicates the third vector corresponding to the m^(th) negativesample word; co indicates weight parameter(s) of the convolutionallayer; ζ indicates offset parameter(s) of the convolutional layer; çindicates weight parameter(s) of the full connection layer; τ indicatesoffset parameter(s) of the full connection layer; γ indicates ahyperparameter; s indicates a similarity calculation function; and λindicates the number of negative sample words.

Optionally, the generation module 604's generating of a word vector foreach obtained word based on the feature vector of the obtained word, andthe trained convolutional neural network specifically includes:

inputting, by the generation module 604, the feature vector of eachobtained word into the full connection layer of the trainedconvolutional neural network for calculation to obtain a vector outputafter the calculation as a corresponding word vector.

Based on the same idea, an implementation of the present specificationfurther provides a device for generating word vectors, including: atleast one processor, and a memory communicatively connected to the atleast one processor, where the memory stores instructions that can beexecuted by the at least one processor, and the instructions areexecuted by the at least one processor to enable the at least oneprocessor to: obtain individual words by segmenting a corpus; establisha feature vector for each obtained word based on n-ary characterscorresponding to the obtained word, where each n-ary characterrepresents n consecutive characters of a word corresponding to the n-arycharacter; train a convolutional neural network based on the featurevectors of the obtained words and the feature vectors of context wordsassociated with each obtained word in the corpus; and generate a wordvector for each obtained word based on the feature vector of theobtained word and the trained convolutional neural network.

Based on the same idea, an implementation of the present specificationfurther provides a non-volatile computer storage medium, where thestorage medium stores computer executable instructions that are used to:obtain individual words by segmenting a corpus; establish a featurevector for each obtained word based on n-ary characters corresponding tothe obtained word, where each n-ary character represents n consecutivecharacters of a word corresponding to the n-ary character; train aconvolutional neural network based on the feature vectors of theobtained words and the feature vectors of context words associated witheach obtained word in the corpus; and generate a word vector for eachobtained word based on the feature vector of the obtained word and thetrained convolutional neural network.

Specific implementations of the present specification are describedabove. Other implementations fall within the scope of the appendedclaims. In some situations, the actions or steps described in the claimscan be performed in an order different from the order in theimplementation and the desired results can still be achieved. Inaddition, the process depicted in the accompanying drawings does notnecessarily require a particular execution order to achieve the desiredresults. In some implementations, multi-tasking and parallel processingcan be advantageous.

The implementations of the present specification are described in aprogressive way. For same or similar parts of the implementations,mutual references can be made to the implementations. Eachimplementation focuses on a difference from the other implementations.In particular, for implementations of an apparatus, a device, and anon-volatile computer storage medium, because implementations of anapparatus, a device, and a non-volatile computer storage medium arebasically similar to method implementation, description is relativelysimple, and references can be made to parts of the method implementationdescriptions.

The apparatus, device, and non-volatile computer storage medium providedin the implementations of the present specification correspond to themethod. Therefore, the apparatus, device, and non-volatile computerstorage medium also have beneficial technical effects that are similarto those of corresponding method. Because the beneficial technicaleffects of the methods have been described in detail above, thebeneficial technical effects of the corresponding apparatus, device, andnon-volatile computer storage medium details are omitted here forsimplicity.

In the 1990s, whether technology improvement was hardware improvement(for example, improvement of a circuit structure, such as a diode, atransistor, or a switch) or software improvement (improvement of amethod procedure) could be clearly distinguished. However, astechnologies develop, the current improvement for many method procedurescan be considered as a direct improvement of a hardware circuitstructure. A designer usually programs an improved method procedure to ahardware circuit, to obtain a corresponding hardware circuit structure.Therefore, a method procedure can be improved using a hardware entitymodule. For example, a programmable logic device (PLD) (for example, afield programmable gate array (FPGA)) is such an integrated circuit, anda logical function of the programmable logic device is determined by auser through device programming. The designer performs programming to“integrate” a digital system to a PLD without requesting a chipmanufacturer to design and produce an application-specific integratedcircuit chip. In addition, at present, instead of manually manufacturingan integrated chip, this type of programming is mostly implemented using“logic compiler” software. The programming is similar to a softwarecompiler used to develop and write a program. Original code needs to bewritten in a particular programming language for compilation. Thelanguage is referred to as a hardware description language (HDL). Thereare many HDLs, such as the Advanced Boolean Expression Language (ABEL),the Altera Hardware Description Language (AHDL), Confluence, the CornellUniversity Programming Language (CUPL), HDCal, the Java HardwareDescription Language (JHDL), Lava, Lola, MyHDL, PALASM, and the RubyHardware Description Language (RHDL). The very-high-speed integratedcircuit hardware description language (VHDL) and Verilog are mostcommonly used. A person skilled in the art should also understand that ahardware circuit that implements a logical method procedure can bereadily obtained once the method procedure is logically programmed usingthe several described hardware description languages and is programmedinto an integrated circuit.

A controller can be implemented using any appropriate method. Forexample, the controller can be a microprocessor or a processor, or acomputer-readable medium that stores computer readable program code(such as software or firmware) that can be executed by themicroprocessor or the processor, a logic gate, a switch, anapplication-specific integrated circuit (ASIC), a programmable logiccontroller, or a built-in microprocessor. Examples of the controllerinclude but are not limited to the following microprocessors: ARC 625D,Atmel AT91SAM, Microchip PIC18F26K20, and Silicon Labs C8051F320. Thememory controller can also be implemented as a part of the control logicof the memory. A person skilled in the art also knows that, in additionto implementing the controller by using the computer readable programcode, logic programming can be performed on method steps to allow thecontroller to implement the same function in forms of the logic gate,the switch, the application-specific integrated circuit, theprogrammable logic controller, and the built-in microcontroller.Therefore, the controller can be considered as a hardware component, anda device configured to implement various functions in the controller canalso be considered as a structure in the hardware component.Alternatively, the device configured to implement various functions caneven be considered as both a software module implementing the method anda structure in the hardware component.

The system, device, module, or unit illustrated in the previousimplementations can be implemented using a computer chip or an entity,or can be implemented using a product having a certain function. Atypical implementation device is a computer. A specific form of thecomputer can be a personal computer, a laptop computer, a cellularphone, a camera phone, an intelligent phone, a personal digitalassistant, a media player, a navigation device, an email transceiverdevice, a game console, a tablet computer, a wearable device, or anycombination thereof.

For convenience of description, the above devices are describedseparately in terms of their functions. Certainly, functions of theunits can be implemented in the same or different software or hardwarewhen the present specification is implemented.

The present specification is described with reference to at least one ofa flowchart or block diagram of the method, device (system), andcomputer program product according to the implementations of the presentspecification. It is worthwhile to note that computer programinstructions can be used to implement each process and/or each block inthe flowcharts and/or the block diagrams and a combination of a processand/or a block in the flowcharts and/or the block diagrams. Thesecomputer program instructions can be provided for a general-purposecomputer, a dedicated computer, an embedded processor, or a processor ofanother programmable data processing device to generate a machine, sothe instructions executed by the computer or the processor of theanother programmable data processing device generate a device forimplementing a specific function in one or more processes in theflowcharts and/or in one or more blocks in the block diagrams.

These computer program instructions can be stored in a computer readablememory that can instruct the computer or the another programmable dataprocessing device to work in a specific way, so the instructions storedin the computer readable memory generate an artifact that includes aninstruction device. The instruction device implements a specificfunction in one or more processes in the flowcharts and/or in one ormore blocks in the block diagrams.

These computer program instructions can be loaded onto the computer oranother programmable data processing device, so that a series ofoperations and steps are performed on the computer or the anotherprogrammable device, thereby generating computer-implemented processing.Therefore, the instructions executed on the computer or the anotherprogrammable device provide steps for implementing a specific functionin one or more processes in the flowcharts and/or in one or more blocksin the block diagrams.

In a typical configuration, a computing device includes one or moreprocessors (CPUs), an input/output interface, a network interface, and amemory.

The memory can include a non-persistent memory, a random access memory(RAM), a non-volatile memory, and/or another form that are in a computerreadable medium, for example, a read-only memory (ROM) or a flash memory(flash RAM). The memory is an example of the computer readable medium.

The computer readable medium includes persistent, non-persistent,movable, and unmovable media that can store information by using anymethod or technology. The information can be a computer readableinstruction, a data structure, a program module, or other data. Examplesof the computer storage medium include but are not limited to a phasechange random access memory (PRAM), a static random access memory(SRAM), a dynamic random access memory (DRAM), another type of RAM, aROM, an electrically erasable programmable read-only memory (EEPROM), aflash memory or another memory technology, a compact disc read-onlymemory (CD-ROM), a digital versatile disc (DVD) or another opticalstorage, a cassette magnetic tape, a magnetic tape/magnetic diskstorage, another magnetic storage device, or any other non-transmissionmedium. The computer storage medium can be used to store informationaccessible by a computing device. Based on the definition in the presentspecification, the computer readable medium does not include transitorymedia such as a modulated data signal and carrier.

It is also worthwhile to note that terms “include”, “include” or anyother variant is intended to cover non-exclusive inclusion, so thatprocesses, methods, commodities or devices that include a series ofelements include not only those elements but also other elements thatare not explicitly listed, or elements inherent in such processes,methods, commodities or devices. An element described by “includes a . .. ” further includes, without more constraints, another identicalelement in the process, method, product, or device that includes theelement.

A person skilled in the art should understand that the implementationsof the present specification can be provided as methods, systems orcomputer program products. Therefore, the present specification can takea form of complete hardware implementations, complete softwareimplementations, or implementations combining software and hardware.Further, the present specification can take a form of a computer programproduct implemented on one or more computer-usable storage media(including but not limited to disk storage, CD-ROM, and optical storage)containing computer-usable program code.

The present specification can be described in the general context ofcomputer executable instructions executed by a computer, for example, aprogram module. Generally, the program module includes a routine, aprogram, an object, a component, a data structure, etc. executing aspecific task or implementing a specific abstract data type. The presentspecification can also be practiced in distributed computingenvironments. In the distributed computing environments, tasks areperformed by remote processing devices connected through acommunications network. In a distributed computing environment, theprogram module can be located in both local and remote computer storagemedia including storage devices.

The implementations of the present specification are described in aprogressive way. For same or similar parts of the implementations,mutual references can be made to the implementations. Eachimplementation focuses on a difference from the other implementations.Particularly, a system implementation is basically similar to a methodimplementation, and therefore is described briefly. For related parts,references can be made to related descriptions in the methodimplementation.

The above descriptions are merely examples of the present specificationand are not intended to limit the present application. For a personskilled in the art, the present application can be subject to variousmodifications and variations. Any modification, equivalent replacementor improvement made within spirit and principles of the presentapplication shall be included in claims of the present application.

The various embodiments described above can be combined to providefurther embodiments. All of the U.S. patents, U.S. patent applicationpublications, U.S. patent applications, foreign patents, foreign patentapplications and non-patent publications referred to in thisspecification and/or listed in the Application Data Sheet areincorporated herein by reference, in their entirety. Aspects of theembodiments can be modified, if necessary to employ concepts of thevarious patents, applications and publications to provide yet furtherembodiments.

These and other changes can be made to the embodiments in light of theabove-detailed description. In general, in the following claims, theterms used should not be construed to limit the claims to the specificembodiments disclosed in the specification and the claims, but should beconstrued to include all possible embodiments along with the full scopeof equivalents to which such claims are entitled. Accordingly, theclaims are not limited by the disclosure.

What is claimed is:
 1. A method for generating word vectors, comprising:obtaining words by segmenting a corpus; establishing a feature vectorfor each obtained word based, at least in part, on one or more n-arycharacters corresponding to the obtained word, wherein each n-arycharacter represents n consecutive characters of a word; training aconvolutional neural network based, at least in part, on the featurevectors of the obtained words and the feature vectors of context wordsassociated with each obtained word in the corpus by: traversing thecorpus after word segmentation, and performing the following actionsresponsive to a current word being accessed during the traversing:determining one or more context words and one or more negative samplewords associated with the current word in the corpus; inputting featurevectors of the context words associated with the current word into aconvolutional layer of the convolutional neural network forconvolutional calculation; inputting a result of the convolutionalcalculation into a pooling layer of the convolutional neural network forpooling calculation to obtain a first vector; inputting a feature vectorof the current word into a full connection layer of the convolutionalneural network for calculation to obtain a second vector; and inputtinga feature vector of a negative sample word associated with the currentword into the full connection layer of the convolutional neural networkfor calculation to obtain a third vector; and updating parameters of theconvolutional neural network based, at least in part, on the firstvector, the second vector, the third vector, and a specified lossfunction; and generating a word vector for each obtained word based, atleast in part, on the feature vector of the obtained word and thetrained convolutional neural network.
 2. The method according to claim1, wherein characters of an obtained word comprise each characterconstituting the obtained word, and mark characters added to one or moreof a start position and an end position of the obtained word.
 3. Themethod according to claim 1, wherein the establishing the feature vectorfor each obtained word based, at least in part, on one or more n-arycharacters corresponding to the obtained word comprises: determining atotal number of distinct n-ary characters in a collection of respectiven-ary characters corresponding to each of the obtained words; andestablishing, for each obtained word, a feature vector whosedimensionality is determined based, at least in part, on the totalnumber.
 4. The method according to claim 3, wherein the number ofdimensions of the feature vector is the total number.
 5. The methodaccording to claim 3, wherein values are assigned to individualdimensions of the feature vector to indicate a mapping between n-arycharacters and the obtained word.
 6. The method according to claim 1,further comprising performing the convolutional calculation, at least inpart, by concatenating the feature vectors of a subset of the contextwords associated with the current word.
 7. The method according to claim1, further comprising performing the pooling calculation by: performingat least one of a maximum pooling calculation or an average poolingcalculation.
 8. The method according to claim 1, wherein updating theparameters of the convolutional neural network based, at least in part,on the first vector, the second vector, the third vector, and thespecified loss function comprises: calculating a first similaritybetween the second vector and the first vector, and a second similaritybetween the third vector and the first vector; and updating theparameters of the convolutional neural network based, at least in part,on the first similarity, the second similarity, and the specified lossfunction.
 9. The method according to claim 1, wherein the loss functionincludes at least a weight parameter of the convolutional layer, anoffset parameter of the convolutional layer, a weight parameter of thefull connection layer, an offset parameter of the full connection layer,a hyperparameter, and a similarity calculation function.
 10. The methodaccording to claim 1, wherein the generating a word vector for eachobtained word based, at least in part, on the feature vector of theobtained word and the trained convolutional neural network comprises:inputting the feature vector of each obtained word into the fullconnection layer of the trained convolutional neural network to obtain avector output.
 11. An apparatus for generating word vectors, comprising:an acquisition module, configured to obtain words by segmenting acorpus; an establishment module, configured to establish a featurevector for each obtained word based, at least in part, on one or moren-ary characters corresponding to the obtained word, wherein each n-arycharacter indicates n consecutive characters of a word; a trainingmodule, configured to train a convolutional neural network based, atleast in part, on the feature vectors of the obtained words and thefeature vectors of context words associated with each obtained word inthe corpus by: traversing the corpus after word segmentation, andperforming the following actions responsive to a current word beingaccessed during the traversal: determining one or more context words andone or more negative sample words associated with the current word inthe corpus; inputting feature vectors of the context words associatedwith the current word into a convolutional layer of the convolutionalneural network for convolutional calculation; inputting a result of theconvolutional calculation into a pooling layer of the convolutionalneural network for pooling calculation to obtain a first vector;inputting a feature vector of the current word into a full connectionlayer of the convolutional neural network for calculation to obtain asecond vector; and inputting a feature vector of a negative sample wordassociated with the current word into the full connection layer of theconvolutional neural network for calculation to obtain a third vector;and updating parameters of the convolutional neural network based, atleast in part, on the first vector, the second vector, the third vector,and a specified loss function; and a generation module, configured togenerate a word vector for each obtained word based on the featurevector of the obtained word and the trained convolutional neuralnetwork.
 12. The apparatus according to claim 11, wherein characters ofan obtained word comprise each character constituting the obtained word,and mark characters added to a start position and/or an end position ofthe obtained word.
 13. The apparatus according to claim 11, wherein theestablishing, by the establishment module, of a feature vector for eachobtained word based, at least in part, on one or more n-ary characterscorresponding to the obtained word comprises: determining, by theestablishment module, a total number of distinct n-ary characters in acollection of respective n-ary characters corresponding to each of theobtained words; and establishing, for each obtained word, a featurevector whose dimensionality is determined based, at least in part, onthe total number.
 14. The apparatus according to claim 13, wherein thenumber of dimensions of the feature vector is the total number.
 15. Theapparatus according to claim 13, wherein values are assigned toindividual dimensions of the feature vector to indicate a mappingbetween n-ary characters and the obtained word.
 16. The apparatusaccording to claim 11, wherein performing the convolutional calculationby the training module comprises: performing, by the training module,the convolutional calculation according to the following formula:{tilde over (x)} _(i) =x _(i:i+θ-1)=[x _(i) ^(T) ,x _(i+1) ^(T) , . . .,x _(i+θ-1) ^(T)]^(T)y _(i)=σ(ω{tilde over (x)} _(i)+ζ) wherein x_(i) indicates a featurevector of the i^(th) context word; x_(i:i+θ-1) indicates a vector thatis obtained by concatenating the feature vectors of the i^(th) to thei+θ−1^(th) context words; y_(i) indicates the i^(th) element in thevector that is obtained through the convolutional calculation; ωindicates one or more weight parameters of the convolutional layer; ζindicates one or more offset parameters of the convolutional layer; σindicates an activation function.
 17. The apparatus according to claim11, wherein performing the convolutional calculation by the trainingmodule comprises: performing, by the training module, at least one of amaximum pooling calculation or an average pooling calculation.
 18. Theapparatus according to claim 11, wherein the updating, by the trainingmodule, of the parameters of the convolutional neural network based, atleast in part, on the first vector, the second vector, the third vector,and the specified loss function comprises: calculating, by the trainingunit, a first similarity between the second vector and the first vector,and a second similarity between the third vector and the first vector;and updating the parameters of the convolutional neural network based,at least in part, on the first similarity, the second similarity, andthe specified loss function.
 19. The apparatus according to claim 11,wherein the loss function comprises:${l( {w,{c;\omega},\ \zeta,\ \varsigma,\tau} )} = {\log\;( {1 + {\sum\limits_{m = 1}^{\lambda}{\exp\;( {{- \gamma} \cdot ( {{s( {w,c} )} - {s( {w_{m}^{\prime},c} )}} )} )}}} )}$wherein c indicates the first vector; w indicates the second vector;w_(m)′ indicates the third vector corresponding to the m^(th) negativesample word; ω indicates one or more weight parameters of theconvolutional layer; ζ indicates one or more offset parameters of theconvolutional layer; ç indicates one or more weight parameters of thefull connection layer; τ indicates one or more offset parameters of thefull connection layer; γ indicates a hyperparameter; s indicates asimilarity calculation function; and λ indicates the number of negativesample words.
 20. The apparatus according to claim 11, wherein thegenerating, by the generation module, of a word vector for each obtainedword based, at least in part, on the feature vector of the obtained wordand the trained convolutional neural network comprises: inputting, bythe generation module, the feature vector of each obtained word into thefull connection layer of the trained convolutional neural network toobtain a vector output.
 21. A method for generating word vectors,comprising: Act 1: establishing a vocabulary of words obtained bysegmenting a corpus, wherein the obtained words exclude a word thatappears for less than a predetermined number of times in the corpus; Act2: determining a total number of n-ary characters corresponding to allof the obtained words, wherein same n-ary characters are counted once,and each n-ary character represents n consecutive characters of anobtained word corresponding to the n-ary character; Act 3: establishing,for each obtained word, based on each n-ary character corresponding tothe obtained word, a feature vector whose dimensionality is the totalnumber, wherein each dimension of the feature vector corresponds to adifferent n-ary character, and the value of each dimension indicateswhether a corresponding n-ary character is mapped to the obtained wordcorresponding to the feature vector; Act 4: traversing the corpus onwhich word segmentation is performed, and performing Act 5 with respectto the current word accessed during the traversal, and if the traversalis completed, performing Act 6; otherwise continuing the traversal; Act5: using the current word as a center, sliding towards both sides for atmost k words to establish a window, and using the words in the windowexcept the current word as context words, inputting feature vectors ofthe context words associated with the current word into a convolutionallayer of the convolutional neural network for convolutional calculation,and inputting a result of the convolutional calculation into a poolinglayer of the convolutional neural network for pooling calculation toobtain a first vector; inputting a feature vector of the current wordand a feature vector of a negative sample word selected in the corpusinto the full connection layer of the convolutional neural network forcalculation to obtain a second vector and a third vector; and updatingparameters of the convolutional neural network based on the firstvector, the second vector, the third vector, and a specified lossfunction, wherein the convolutional calculation is performed accordingto the following formula:{tilde over (x)} _(i) =x _(i:i+θ-1)=[x _(i) ^(T) ,x _(i+1) ^(T) , . . .,x _(i+θ-1) ^(T)]^(T)y _(i)=σ(ω{tilde over (x)} _(i)+ζ) the pooling calculation is performedaccording to the following formula:${{c(j)} = {\max\limits_{{i = 1},2,\;\ldots\;,{t - \theta + 1}}\{ {y_{i}(j)} \}}},$or${c(j)} = {\underset{{i = 1},2,\;\ldots\;,{t - \theta + 1}}{average}\{ {y_{i}(j)} \}}$and the loss function comprises:${l( {w,{c;\omega},\ \zeta,\ \varsigma,\tau} )} = {\log\;( {1 + {\sum\limits_{m = 1}^{\lambda}{\exp\;( {{- \gamma} \cdot ( {{s( {w,c} )} - {s( {w_{m}^{\prime},c} )}} )} )}}} )}$wherein x_(i) indicates a feature vector of the i^(th) context word;x_(i:i+θ-1) indicates a vector that is obtained by concatenating thefeature vectors of the i^(th) to the i+θ−1^(th) context words; y_(i)indicates the i^(th) element in the vector that is obtained through theconvolutional calculation; ω indicates weight parameters of theconvolutional layer; ζ indicates offset parameters of the convolutionallayer; σ indicates an activation function; max indicates a maximum valuefunction; average indicates an average value function; c(j) indicatesthe j^(th) element in the first vector that is obtained through poolingcalculation; t indicates the number of context words; c indicates thefirst vector; w indicates the second vector; w_(m)′ indicates the thirdvector corresponding to the m^(th) negative sample word; ω indicatesweight parameters of the convolutional layer; ζ indicates offsetparameters of the convolutional layer; ç indicates weight parameters ofthe full connection layer; τ indicates offset parameters of the fullconnection layer; γ indicates a hyperparameter; s indicates a similaritycalculation function; and λ indicates the number of negative samplewords; and Act 6: inputting the feature vector of each obtained wordinto the full connection layer of the trained convolutional neuralnetwork for calculation to obtain corresponding word vectors.
 22. Anapparatus for generating word vectors, comprising: at least oneprocessor, and a memory communicatively connected to the at least oneprocessor, the memory storing instructions that are executed by the atleast one processor to cause the at least one processor to implement: anacquisition module, configured to obtain words by segmenting a corpus;an establishment module, configured to establish a feature vector foreach obtained word based, at least in part, on one or more n-arycharacters corresponding to the obtained word, wherein each n-arycharacter indicates n consecutive characters of a word; a trainingmodule, configured to train a convolutional neural network based, atleast in part, on the feature vectors of the obtained words and thefeature vectors of context words associated with each obtained word inthe corpus by: traversing the corpus after word segmentation, andperforming the following actions responsive to a current word beingaccessed during the traversal: determining one or more context words andone or more negative sample words associated with the current word inthe corpus; inputting feature vectors of the context words associatedwith the current word into a convolutional layer of the convolutionalneural network for convolutional calculation; inputting a result of theconvolutional calculation into a pooling layer of the convolutionalneural network for pooling calculation to obtain a first vector;inputting a feature vector of the current word into a full connectionlayer of the convolutional neural network for calculation to obtain asecond vector; and inputting a feature vector of a negative sample wordassociated with the current word into the full connection layer of theconvolutional neural network for calculation to obtain a third vector;and updating parameters of the convolutional neural network based, atleast in part, on the first vector, the second vector, the third vector,and a specified loss function; and a generation module, configured togenerate a word vector for each obtained word based on the featurevector of the obtained word and the trained convolutional neuralnetwork.